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(57)Abstract: 

PROBLEM TO BE SOLVED: To provide a voice 
recognition device which effectively adjust an element 
distribution number at a high speed as to a probability 
model using a mixed distribution. 
SOLUTION: This voice recognition device uses the 
probability model using the mixed distribution and 
composed of a standard pattern storage means 103 



70* 



which holds a standard pattern, a recognition means 104 A$*»v*-*5ggg^ 
which inputs a voice and outputs the recognition result 
by using the standard pattern, a standard pattern 
generating means 102 which inputs a voice for learning 
and generates the standard pattern, and a standard 
pattern adjusting means 203 which adjusts the element 
distribution number of the mixed distribution of the 
standard pattern. Consequently, tree structures of 
element distributions are generated by states in voice 
recognition using a hidden Markov model having the 
mixed Gaussian distribution as an output probability 
distribution and the element distribution number of the 
respective states are adjusted by using an information amount reference. 
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* NOTICES * 

Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2. **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim l] A voice recognition unit characterized by to provide a standard-pattern storage means are a 
voice recognition unit using a probability model using mixed distribution, and hold a standard pattern, a 
recognition means consider voice as an input and output a recognition result using a standard pattern, a 
standard-pattern creation means consider voice for study as an input and create a standard pattern, and 
a standard-pattern accommodation means adjust the number of element distribution of mixed 
distribution of a standard pattern. 

[Claim 2] A voice recognition unit characterized by to provide a standard-pattern storage means are a 
voice recognition unit using a probability model using mixed distribution, and hold a standard pattern, a 
recognition means consider voice as an input and output a recognition result using a standard pattern, a 
standard -pattern correction means consider voice for adaptation- izing as an input, and correct a standard 
pattern, and a standard-pattern accommodation means adjust the number of element distribution of 
mixed distribution of a standard pattern. 

[Claim 3] A voice recognition unit according to claim 1 or 2 characterized by providing a standard-pattern 
accommodation means which consists of a tree structure creation means to create the tree structure of 
element distribution, and an element distribution selection means to choose distribution by considering 
study data as an input. 

[Claim 4] A voice recognition unit according to claim 1 or 2 with which said standard-pattern 
accommodation means is characterized by providing a minimax distribution selection means to use a 
minimax method for selection of element distribution. 

[Claim 5] A voice recognition unit according to claim 3 with which said element distribution selection 
means is characterized by using the study amount of data corresponding to each element distribution as a 
selection criterion in selection of element distribution. 

[Claim 6] A voice recognition unit according to claim 3 with which said element distribution selection 
means is characterized by using the description length minimum criteria as a selection criterion in 
selection of element distribution. 

[Claim 7] A voice recognition unit according to claim 3 with which said element distribution selection 
means is characterized by using the Akaike information criterion as a selection criterion in selection of 
element distribution. 

[Claim 8] A voice recognition unit according to claim 3 with which said tree structure creation means is 



characterized by using divergence as a distance between distribution in selection of element distribution. 
[Claim 9] A voice recognition unit according to claim 3 with which said tree structure creation means is 
characterized by using likelihood to study data as a distance between distribution. 

[Claim 10] A voice recognition unit given in either of claim 1 to claims 9 characterized by using a hidden 
Markov model as a probability model using mixed distribution. 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[The technical field to which invention belongs] This invention relates to the voice recognition unit using 
the hidden Markov model using especially mixed Gaussian distribution (or gauss mixed distribution) as 
output probability distribution about the standard-pattern creation method in the pattern recognition 
which used mixed distribution. 
[0002] 

[Description of the Prior Art] In recent years, research on recognition by the machine of a voice pattern is 
done, Many methods are proposed. As technique typical in this There is a method using a hidden Markov 
model (HMM). And it is the purpose about the ability to recognize also in whose voice as a voice 
recognition system using HMM. The recognition system of the unspecified speaker carried out is studied 
and developed briskly. 

[0003] Hereafter, HMM is taken for an example and a voice recognition system is explained based on 
drawing 2 . Utterance of a speaker inputted into the voice recognition unit, Every unit called the frame 
which is inputted into the input configuration creation means 101, and has a certain time amount length 
through processes, such as an AD translation and a voice analysis It is changed into the time series of a 
feature vector. This feature vector Time series is called an input configuration here. Moreover, the length 
of a frame is usual. Ten to 100ms It is a degree. And a feature vector is what extracted the characteristic 
quantity of the voice spectrum in the time of day, and is usually 100 dimensions from ten dimensions. 
[0004] HMM is memorized by the standard-pattern storage means 103. HMM is one of the models of the 
audio information source, and can learn the parameter using a speaker's voice. Explanation of a 
recognition means describes HMM in detail. Here, HMM is usually prepared for every recognition unit. 
Moreover, a phoneme is taken for an example as a recognition unit here. For example, in an unspecified 
speaker recognition system, the unspecified speaker HMM learned beforehand, using utterance of many 
speakers as HMM of a standard-pattern storage means is used. 

[0005] And the word "HMM" is used with the recognition means 104. An input configuration is recognized. 
Here, HMM is the model of the audio information source, and it serves as statistics and a probable model 
in order to cope with various fluctuation of a voice pattern. Moreover, detailed explanation of HMM is 
RABINA, JUANGU work, and the Furui translation. 102- 187 pages of "the base (below) of speech 
recognition" and the NTT advance technology (1995) (following, reference l) can be started. 
[0006] HMM of each phoneme usually consists of state transitions of one to ten conditions, and meantime, 



respectively. Usually, ****** and a final state are defined, for every unit time amount, a symbol is 
outputted from each condition and a state transition is performed. The voice of each phoneme is 
expressed as time series of the symbol outputted from HMM between the state transitions from ****** to 
a final state. 

[0007] The appearance probability of a symbol is defined as each condition, and transition probability is 
defined as each transition between conditions. A transition probability parameter is a parameter for 
expressing the time fluctuation of a voice pattern. A output probability parameter expresses fluctuation of 
the voice impersonation of a voice pattern. Utterance is from the model by setting the probability of 
****** to a certain value, and imposing appearance probability and transition probability for every state 
transition. The probability to generate can be searched for. 

[0008] On the contrary, when utterance is observed and it assumes that it generated from a certain HMM, 
the probability of occurrence can be calculated. HMM which will search for the probability of occurrence 
and will serve as max in each HMM if PIMM is prepared to each recognition candidate and utterance is 
inputted by the speech recognition by HMM by this It is decided that it will be a generation source and it 
considers as a recognition result with the recognition candidate corresponding to the HMM. 
[0009] Although a output probability parameter has a discrete probability-distribution expression and a 
continuous-probability distribution expression, a continuation probability expression is taken for an 
example here. In a continuous-probability- distribution expression, mixed Gaussian distribution, i.e., the 
distribution which added two or more Gaussian distribution with weight, is often used. In the following 
examples, a output probability is taken as mixed gauss continuous probability distribution. And a output 
probability parameter, a transition probability parameter, and two or more Gaussian distribution 
Parameters, such as a weighting factor, give the study voice corresponding to a model, and are beforehand 
learned by the algorithm called a BAUMU Welch algorithm. 

[0010] For example, the case where 1000 words are now made applicable to recognition is assumed. That 
is, the case where the recognition candidate of 1000 words is asked for the correct answer of one word is 
assumed. First, when recognizing a word, it is HMM of each phoneme. It connects and HMM of a 
recognition candidate word is created. In 1000 word recognition, the word for 1000 words "HMM" is 
created. The following (1) type shows input configuration O expressed as time series of a feature vector. 
[Equation l] 

0-Oi,02 f 0 3 , ... ,o t , ... ,o T 0) 

Here, T is the total frame number of an input configuration. 

[0011] Moreover, recognition candidate word It is referred to as Wl, W2, --WN. N here shows the number 
of recognition candidate words. And each recognition candidate word "Wn" Matching between the word 
"HMM", and input configuration O is performed as follows. In future explanation, as long as there is no 
necessity, Subscript n is omitted. First, in the word "HMM", from Condition j, the mean vector of cim and 
each element Gaussian distribution is set to muim, and a covariance matrix is set [ the transition 
probability to Condition i ] to sigmaim for the mixed weight of aji and output probability distribution. 
Here, input time of day, and i and j express the condition of HMM, and, as for m, t expresses a mixed 
element number. The next recurrence formula count about forward probability at (i) is performed. 
[0012] This forward probability at (i) is a partial observation sequence. It is the probability which outputs 



ol, o2, ot, and exists in Condition i in time of day t. 
[Equation 2] 



a t (i)=7Ci (1=1,2... ,l) ••■ (2) 



[Equation 3] 

a, + i(i)=i:at(j) aijbi(oi) 

j ... (3) 

(1=1 I:t-1...T) . 

Here, pii is the probability for an initial state to be i. 

[0013] Moreover, bi (ot) in (3) types is defined by (4) and (5) types which are shown below. 
[Equation 4] 

bi(o,)=ZX, m N(o t ;MHm.£im) " < 4 > 

m 

[Equation 5] 

... (5) 

In this (5). type, K is the number of dimension of an input frame and the mean vector. 

[0014] Moreover, the likelihood to the input pattern to the word "Wn" is called for by (6) types shown 

below. 

[Equation 6] 

pn(X) = CC T (!) ■■• ( 6 > 



In this (6) type, I is a final state. 

[0015] This processing is performed about each word model, and the recognition result word to input 
configuration X "Wn (here, the upper part of n has **; hat by following the (7) formula)" is called for by (7) 
types shown below. 
[Equation 7] 

n = argmax n P n (X) — (7) 



And this recognition result word "Wn" is sent to the recognition result output section. The recognition 
result output section processes sending the control instruction corresponding to an output or a recognition 
result for a recognition result on a screen at another equipment etc. 

[0016] Next, the standard-pattern creation means 102 is explained. In unspecified speaker recognition, 



the standard-pattern creation means 102 accumulates utterance of many prior speakers, and presumes a 
parameter using the utterance. First, backward probability is introduced by the following (8) and (9) 
types. 

[Equation 8] 

p T (i) = l (i=1 N) - (8) 



[Equation 9] 

WO-la^Ot+O^O), - ( 9 ) 

(l= T-1J-2, ... ,1 
i=1 N) 

(9) When, as for betat (0 in a formula, time of day t and Condition i are given It is the probability of the 
partial observation sequence from time of day t+1 to termination., 

[0017] And when the observation sequence O is given using forward probability and backward probability, 
the probability which exists in Condition i at time of day t is given by (10) types shown below. 
[Equation 10] 

Moreover, it exists in Condition i at time of day t, and exists in Condition j at time of day t+1. Probability 
is given by the following (ll) types. 
[Equation ll] 

r _ a t (i)a i ib i (Of ) .i)fit+i(i) 

<9,<ii> " E/., a.tOaiibjCo.^Pt^O) ' " (11> 

Moreover, in the case of mixed Gaussian distribution, it is time of day t. The k-th of a state number i The 
probability (occupancy frequency) which exists in a mixed element is given by the following (12) types. 
[Equation 12] 

a t (i)Pt(i) c )k N(o,,|iih,£ii«) 



Y'(i.k)= 



a t <i)0t(i) £m=iC, m N(o,,|n rn> Li m > . -(12) 



[0018] It is based on the above calculated value, The estimate of pi, a, mu, sigma, and c is given by the 
following (13) • (17) types. 
[Equation 13] 
TCi = Yi(i) 

... (13) 



[Equation 14] 



vT-1 r ij 

Sfci ?! - (14) 



an = 



[Equation 15] 
_ £ t T „, 7,'(i,k) 

C|k = 



EE.1 ... (15) 



[Equation 16] 

_ r, T =i Yt'(i.k)o, 



£ t T -t lftJ.k> ... (16) 



[Equation 17] 

__ z t T =1 y,'(j.k)(o,-Mjk)(o,-m k )' 

^Tmkl "< 17 > 

[0019] With a BAUMU-Welch algorithm, a parameter is updated based on such estimate and the repeat of 
newly presuming estimate is further performed using the updated parameter. And it is proved that the 
probability to recognize an observation sequence becomes large for every repeat. In the above, the case 
where HMM was used was taken for the example and the conventional voice recognition unit was 
explained. 
[0020] 

[Problem(s) to be Solved by the Invention] Now, as mentioned above, discrete distribution and continuous 
distribution are shown in an output probability-distribution expression. And in discrete distribution and 
continuous distribution, mixed Gaussian distribution is well used also continuous distribution, especially 
in it. The reason this mixed Gaussian distribution is used is because the engine performance of an output 
probability- distribution expression is excellent. 

[0021] Here, when using mixed Gaussian distribution (it considers as mixed distribution hereafter), there 
is no clear indicator which magnitude the number of element distribution should be made. Usually, the 
number of element distribution for every condition presupposes that it is fixed covering all conditions, 
some numbers of element distribution are tried on HMM of mixed distribution, and procedure of choosing 
the number of element distribution with the highest engine performance in it is performed. 
[0022] However, it is expected that the required number of element distribution changes with conditions. 
For example, when it has many unnecessary element distribution, increase of the computational 
complexity for calculating the probability of element distribution will be caused. Moreover, in a condition 
with few counts of an appearance, the engine performance to strange data to which fault study will not be 
carried out in process of parameter estimation may deteriorate. Therefore, as for the number of element 
distribution in each condition of mixed distribution HMM, being optimized for every condition is desirable. 
[0023] And the simplest method of optimizing the number of element distribution for every condition, It is 
the method of choosing the number of element distribution to which the number of element distribution is 
changed for every condition, a recognition experiment is conducted, and the recognition engine 
performance becomes high for every condition. However, it is almost unable for the number of conditions 
of HMM to increase very much with 1000 to 10000 on the whole, and to usually optimize the number of 



element distribution for every condition in respect of computational complexity. 

[0024] It is in this invention having been made under such a background and offering a high speed and 
the voice recognition unit performed effectively for accommodation of the number of element distribution 
in the probability model using mixed distribution. 
[0025] 

[Means for Solving the Problem] A voice recognition unit of this invention is a voice recognition unit using 
a probability model which used mixed distribution, and is characterized by to provide a standard-pattern 
storage means hold a standard pattern, a recognition means consider voice as an input and output a 
recognition result using a standard pattern, a standard-pattern creation means consider voice for study 
as an input and create a standard pattern, and a standard-pattern accommodation means adjust the 
number of element distribution of mixed distribution of a standard pattern. 

[0026] The voice recognition unit of this invention is a voice recognition unit using a probability model 
which used mixed distribution, and is characterized by to provide a standard-pattern storage means hold 
a standard pattern, a recognition means considers voice as an input and output a recognition result using 
a standard pattern, a standard-pattern correction means consider voice for adaptation-izing as an input, 
and correct a standard pattern, and a standard-pattern accommodation means adjust the number of 
element distribution of mixed distribution of a standard pattern. 

[0027] A voice recognition unit of this invention is characterized by providing a standard-pattern 
accommodation means which consists of a tree structure creation means to create the tree structure of 
element distribution, and an element distribution selection means to choose distribution by considering 
study data as an input. It is characterized by a voice recognition unit of this invention possessing a 
minimax distribution selection means by which said standard-pattern accommodation means uses a 
minimax method for selection of element distribution. 

[0028] It is characterized by using the study amount of data corresponding to each element distribution 
for a voice recognition unit of this invention as a selection criterion in selection of element distribution of 
said element distribution selection means. It is characterized by using the description length minimum 
criteria for a voice recognition unit of this invention as a selection criterion in selection of element 
distribution of said element distribution selection means. It is characterized by using the Akaike 
information criterion for a voice recognition unit of this invention as a selection criterion in selection of 
element distribution of said element distribution selection means. 

[0029] It is characterized by using divergence for a voice recognition unit of this invention as a distance 
between distribution in selection of element distribution of said tree structure creation means. It is 
characterized by using likelihood [ as opposed to study data in said tree structure creation means ] for a 
voice recognition unit of this invention as a distance between distribution. A voice recognition unit of this 
invention is characterized by using a hidden Markov model as a probability model which used mixed 
distribution. 
[0030] 

[Embodiment of the Invention] Hereafter, the operation gestalt of this invention is explained with 
reference to a drawing. Drawing 1 is the block diagram showing a basing-on 1 operation gestalt of this 
invention configuration. A different point from the conventional example of drawing 2 is inserting the 
standard-pattern creation means 203 between the standard-pattern creation means 102 and the 
standard-pattern storage means 103. In the block of the voice recognition unit of drawing 1 , to the same 



configuration (the input configuration creation means 102, the standard -pattern creation means 101, the 
standard-pattern storage means 103, recognition means 104) as the block of the voice recognition unit of 
drawing 2 , the same sign is attached and detailed explanation is omitted. 

[0031] In this drawing, the input configuration creation means 102 creates an input configuration from 
the input voice (sound signal which the speaker generated) inputted. Moreover, the standard-pattern 
creation means 102 creates a standard pattern, as explanation of the conventional example described. 
The standard-pattern accommodation means 203 is the created standard pattern. The number of element 
distribution is changed. The standard-pattern storage means 103 memorizes the created standard 
pattern, and the recognition means 205 recognizes the inputted voice using a standard pattern, and it 
outputs a recognition result. 

[0032] Actuation of the standard-pattern accommodation means 203 added to 1 operation gestalt in this 
invention below is explained to details. The problem of optimization of the number of element distribution 
in the condition of a hidden Markov model (HMM) can be regarded as the problem which chooses the 
optimal probability model to the given data, selection of this probability model - setting -- the past -- 
various information criteria have been proposed. 

[0033] With 1 operation gestalt, how to optimize the number of distribution using MDL (description 
length min) which is one of them is considered. First, the criteria of Above MDL are explained here. It is 
proved that the description length minimum (Minimum Description Length; MDL) criteria are effective 
in the problem which chooses the optimal probability model from research of the latest information theory 
and a computation theory-learning theory to data. 

[0034] The description length minimum criteria are for example, South Korean ****** and "the 
mathematical principle of the Iwanami lecture applied mathematics 11, information, and 
agreement-izing", It is explained to 249 pages - 275 pages of Iwanami Shoten (1994) (following, reference 
2). It is about data easy [ if possible ] and moreover given like AIC (Akaike Information Criterion; Akaike 
information criterion) etc. It is one of the criteria which embodied the idea that the model which can be 
expressed was a good model. 

[0035] MDL criteria are data s=sl, a model that gives the smallest description length to sN in a 
probability model i= 1, I. They are the criteria used as the optimal model. Here, the description length 
IMDL to probability-model i (0 is given by the following (18) formulas. 
[Equation 18] 

iMouD^-looPew^J+f^logN+lool ... (1 8 ) 

here alphai -- number of dimension (number of a free parameter) of Model i ** theta (0 was presumed 
using Data XN. It is the maximum likelihood estimator of free parameter thetaG) = (theta 1 (i), 
thetaalphai (0) of Model i. 

[0036] a logarithm [ on the above-mentioned (18) formula and as opposed to data in the 1st term ] - it is 
the amount which attached minus sign to likelihood (it is hereafter described as likelihood), the 2nd term 
is an amount showing the complexity of a model, and the 3rd term is description length which requires in 
order to choose Model i. Thus, the likelihood to data becomes large and it follows, so that a model is more 
complicated. The value of the 1st term decreases. On the other hand, if a model becomes complicated, 
since a free number of parameters will increase, the value of the 2nd term increases. Thus, the relation of 



a trade off between the 1st term and the 2nd term is, and it is expected that the description length IMDL 
(i) will take the minimum value with the model which has suitable complexity. 

[0037] And the number optimization algorithm of element distribution for every condition using these 
MDL criteria is as follows. First, mixed Gaussian distribution HMM using study data is learned in the 
usual procedure. Under the present circumstances, the number of element distribution presupposes that 
it is fixed covering all conditions, and learns the increase of the number of element distribution, and 
HMM carried out to the number considered to be a maximum. Moreover, occupancy frequency gamma't (i, 
k) for every element distribution is saved in process of study. It is the subscript of element distribution [ in 
/ i and / in k / a condition ] here. [ the subscript of a condition ] 

[0038] Next, the standard-pattern adjustment means 203 optimizes the number of element distribution in 
each condition. In addition, this point is made to explain only one condition i, and omits the subscript i of 
a condition. The standard-pattern adjustment means 203 performs the same processing also to other 
conditions. First, the standard-pattern adjustment means 203 creates the tree structure of element 
distribution in every condition with an internal tree structure creation means. Here, the root is one 
distribution and a leaf is each element distribution. 

[0039] Although various methods for creating the tree structure of element distribution at this time can 
be considered, a binary tree is created here using a k-means algorithm. Moreover, cull back divergence is 
used as a distance during each element distribution (distance between distribution). This cull back 
divergence is easily calculable from the value of the average and covariance of Gaussian distribution. The 
tree structure creation method of this element distribution is indicated by patent No. 002531073 and the 
above-mentioned reference 2 at details. 

[0040] Next, the standard-pattern adjustment means 203 asks for distribution of distribution (node 
distribution) of each node of the above-mentioned tree structure. Here, distribution of each node 
distribution is called for from the occupancy frequency and Gaussian distribution parameter of element 
distribution of all leaves to govern. Now, the set of the node distribution which divides this tree structure 
up and down is called "a cut." Although a large number [ the number of these cuts ], each cut becomes one 
probability model in that condition. Here, it considers asking for the optimal cut using MDL criteria. 
[0041] For example, the description length to a certain cut U is calculated as follows. Node distribution 
which constitutes Cut U here It is referred to as SI and -SM. Here, M is the number of the node 
distribution in Cut U. Thereby, likelihood L (Sm) to the distribution Sm of data can be approximated like 
(19) and (20) types which are shown below. 
[Equation 19] 

—5- (IOO{(27C) K |2:|+K)r < sm) (19) 

[0042] It sets at an above-mentioned (19) ceremony, and is [Equation 20]. 
r<Sm)= tli A. 7 * ... (20) 



It comes out, and it is, s is all leaf distribution under Distribution Sm, and K is the mean vector used as a 
share standard pattern, and the number of dimension of distribution. Moreover, in (19) types, muSm and 



sigmaSm are the mean vector and distributions in Distribution Sm, respectively. 

[0043] By using the result mentioned above can describe description length I (U) to Cut U like the 
following (21) types. 
[Equation 21] 

l(U)=Z L(Sm) + — . 2KM | 0 g £ r(Sm) 

= 4" 2 r(Sm)loo<|iSm|)+KMIooV+-^-(1+loo(2n))-V 

- (21) 

It is here and is [Equation 22]. 

V =Ji r(Sm) -(22) 

It is, and it comes out and it is [ it is an amount equivalent to the frame number of all the data 
corresponding to U, and / this V is not based on the method of division, but ] constant value. 
[0044] And the standard-pattern adjustment means 203 is related with all possible cuts, and is 
description length. 1 (U) is calculated and the cut U with smallest I (U) is chosen. At this time, the class of 
possible division, i.e., the number of Cuts U, usually increases very much. Then, the computational 
complexity at the time of selection of Cut U is saved by using the following algorithms. Hereafter, the 
number optimization of element distribution of a certain condition p is described. 

[0045] First, the node (joint) to Condition p is created. Here, this node is called a root node. The 
distributed parameter of a root node is presumed from all the data samples corresponding to all element 
distribution corresponding to this condition, for example, the tree structure a binary tree - it is - 
distribution of a root node - SO, the - two - a ** - a child node - distribution - S - one - S - two - ** ■ 
having carried out the time - a parent node - from - a child node - having developed - the time - 
description - merit - change a part - the following - (-- 23 --) - a formula describing - having . 
[Equation 23] 
A=l(S„S 2 )-l(So) 

= 4- (T(S,) log | L Sl | +r(S z ) tool Z S2 1 — HSo) log | Zso | 

+KI00V - (23) 



[0046] For example, the standard-pattern adjustment means 203 develops a parent node, when it is 
delta< 0, and on the other hand, when it is delta> 0, it does not develop a parent node, moreover - the 
time of developing - further - child nodes Si and S2 - processing in which it judges whether change of 
the description length when developing to the child node as well as the processing mentioned above is 
calculated, and it develops about each is repeated. And when expansion of all nodes finishes, it means 
that the set of the node of the end of the expansion is cut, and the node distribution was chosen as 
element distribution. And mixed Gaussian distribution HMM which has only the distribution chosen 
anew as element distribution is created, and procedure which learns the element distribution with the 
data in study anew is performed. 

[0047] The above is explanation of the voice recognition unit of 1 operation gestalt shown in drawing 1 . 
Here, although the hidden Markov model (HMM) was made into the example and explained, also when a 



model is mixed Gaussian distribution, it can apply easily. This supports invention of claim 10. Moreover, 
although explanation of 1 operation gestalt mentioned above explained sound model study, also in case 
speaker adaptation which corrects a standard pattern using little utterance of a user is performed, it is 
possible to use the data for speaker adaptation and to adjust the number of element distribution. In this 
case, as a configuration of the voice recognition unit of invention, instead of a standard-pattern creation 
means, a standard-pattern correction means is used and the voice of the same speaker as the speaker who 
uses for the input configuration creation means for recognition is used for the input voice to this 
standard-pattern correction means. 

[0048] Moreover, in the voice recognition unit of 1 operation gestalt mentioned above, although the 
accommodation means of the number of element distribution by the tree structure was explained, 
accommodation by the minimax distribution selection means using a minimax method can also be 
performed as follows. Hereafter, one condition is explained. First, more than the count (X time) in study 
data, the set of distribution which appeared is set to A and distribution which is not so is set to B. All of 
distribution belonging to A, distribution belonging to B, and the distance of ** are calculated, and the 
distance from distribution of nearest A removes the largest distribution among distribution of B. 
[0049] Next, distribution with the largest distance from distribution of nearest A among distribution of B 
other than the distribution is removed. This procedure is repeated until the number of distribution turns 
into the number of the minimum distribution defined beforehand. And when not becoming smaller than 
the number of the minimum distribution (that is, the number of distribution of B is small), 
above-mentioned processing is suspended at the time. The above corresponds to invention of claim 4. 
[0050] Moreover, in 1 operation gestalt, although MDL criteria were used for selection of a node, it is also 
possible to use an amountofdata threshold. That is, the set of distribution nearest to a leaf is considered 
as a cut among a certain distribution beyond a threshold with the amount of data. The above corresponds 
to invention of claim 5. 

[0051] Furthermore, in 1 operation gestalt, although only the case where MDL criteria were used as an 
information criterion was explained, when the Akaike information criterion (AIC) is used, or when other 
similar information criteria are used, it can apply easily. The above corresponds to invention of claim 7. 
[0052] In addition, in 1 operation gestalt, although divergence was used as a distance during distribution, 
the increment of the likelihood when sharing distribution can also be used as a distance value. The above 
corresponds to invention of claim 9. 

[0053] As mentioned above, although 1 operation gestalt of this invention has been explained in full detail 
with reference to a drawing, a concrete configuration is not restricted to this operation gestalt, and even if 
the design change of the range which does not deviate from the summary of this invention etc. occurs, it is 
included in this invention. 
[0054] 

[Effect of the Invention] In the pattern recognition using mixed Gaussian distribution using the 
parameter accommodation means which was newly added according to the voice recognition unit of this 
invention By adjusting to the number of element distribution to which the recognition engine 
performance becomes about the number of element distribution of an audio standard pattern, and 
becomes high for every condition of optimization, i.e., HMM, about the number of element distribution for 
every condition of HMM Unnecessary element distribution can be excluded, the deterioration to the 
strange voice data based on fault study will be prevented, and it becomes possible to perform highly 



efficient speech recognition. 



[Translation done.] 



* NOTICES * 

Japan Patent Office is not responsible for any 
damages caused by the use of this translation. 

l.This document has been translated by computer. So the translation may not reflect the original 
precisely. 

2 **** shows the word which can not be translated. 
3 .In the drawings, any words are not translated. 



DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing ll It is the block diagram showing the configuration of the voice recognition unit by 1 operation 
gestalt of this invention. 

[Drawing 21 It is the block diagram showing the configuration of the voice recognition unit by the 
conventional example. 
[Description of Notations] 

101 Input Configuration Creation Means 

102 Standard-Pattern Creation Means 

103 Standard-Pattern Storage Means 

104 Recognition Means 

203 Standard-Pattern Accommodation Means 



[Translation done.] 
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